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We consider parallel simulations for asynchronous systems employing L processing elements that 
are arranged on a ring. Processors communicate only among the nearest neighbors and advance 
their local simulated time only if it is guaranteed that this does not violate causality. In simulations 
with no constraints, in the infinite L-limit the utilization scales (Korniss et al, PRL 84, 2000); but, 
the width of the virtual time horizon diverges (i.e., the measurement phase of the algorithm does 
not scale). In this work, we introduce a moving A-window global constraint, which modifies the 
algorithm so that the measurement phase scales as well. We present results of systematic studies in 
which the system size (i.e., L and the volume load per processor) as well as the constraint are varied. 

The A-constraint eliminates the extreme fluctuations in the virtual time horizon, provides a bound 
on its width, and controls the average progress rate. The width of the A-window can serve as a tuning 
parameter that, for a given volume load per processor, could be adjusted to optimize the utilization so 
as to maximize the efficiency. This result may find numerous applications in modeling the evolution 
of general spatially extended short-range interacting systems with asynchronous dynamics, including 
dynamic Monte Carlo studies. 

PACS numbers: 89.90.+n, 02.70.-c, 05.40.-a, 68.35.Ct 


I. INTRODUCTION 


Parallel Discrete Event Simulations (PDES) of asyn¬ 
chronous systems is a computer science term that stands 
for parallel simulations of complex systems with asyn¬ 
chronous dynamics. Such spatially extended complex in¬ 
teracting systems appear across a wide range of fields in 
natural sciences, and their examples include interacting 
spin systems in material physics, activated processes in 
chemistry, contact processes in stochastic epidemic mod¬ 
els, stochastic market models in finance, scheduling call 
arrivals in communication networks, and routing prob¬ 
lems in internet traffic, to mention just a few applica¬ 
tions of PDES. Despite active research in this area [:l|, ||] 
very few of the PDES techniques have filtered through to 
the physics community. Even the simplest random-site 
update Monte Carlo schemes [|j, where updates corre¬ 
spond to Poisson-random discrete events, were long be¬ 
lieved to be inherently serial (at least in the physics com¬ 
munity). Simulation studies of parallel computations for 
asynchronous distributed systems date back more than 
two decades ago i 1- However, it was Lubachevsky’s 
work §| on parallel simulations of dynamic Ising spin 
systems which shed a new light on this old problem and 
showed how to efficiently perform conservative simula¬ 
tions on a parallel computer. The design of efficient al¬ 
gorithms that would allow modeling of asynchronous sys- 
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terns in a parallel processing environment is even more 
important nowadays, when parallel architectures have 
become generally available. The architectures of today 
may consist of several thousands of processing elements, 
but the size of future systems may be of the order of hun¬ 
dred thousands ||. Such architectures pose new ques¬ 
tions of algorithm efficiency and scalability in large scale 
massively parallel processing. We address these ques¬ 
tions for conservative PDES, using the tools of modern 
statistical physics, in particular, those of non-equilibrium 
surface growth Q. 

The difficulties of parallelizing spatially extended asyn¬ 
chronous cellular automata arise because in asynchronous 
systems the discrete events are not synchronized by a 
global clock. For example, in the basic dynamic Ising 
model for ferromagnets discrete spins with two states 
each are placed on a lattice. The discrete events are 
attempted spin flips, where the spin-flip probability at 
some site depends on the energy states of the neighbor¬ 
ing sites. The lattice can be partitioned into a number 
of sub-lattices, and each sub-lattice may be assigned to 
a different processor. Processing elements (PE) attempt 
a number of randomly chosen spin-flips, and communi¬ 
cate with each other in some update attempts (a discrete 
event). Each PE carries its own local virtual time which 
is advanced by every update attempt. The local virtual 
time on a PE is the simulated time at the spins on its 
sub-lattice. In the conservative PDES implementation it 
is ensured that causality is not violated before each PE 
makes an update attempt. Alternatively, in an optimistic 
PDES implementation, PEs make updates without com- 
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municating with the neighbors, thus sometimes causing 
causality errors. The optimistic scheme provides a re¬ 
covery mechanism by undoing the effects of all events 
that have been precessed prematurely. Optimistic PDES 
have been an object of theoretical and simulation studies 
0 0 0 0 0 | . The development of spatio-temporal 
correlations and self-organized criticality have been re¬ 
cently studied in the optimistic simulations of the dy¬ 
namics of Ising spin systems [00- The conservative 
scheme has been used recently to model magnetization 
switching 17 , ballistic particle deposition |18(| , and a 
dynamic phase transition in highly anisotropic thin-film 
ferromagnets 00- These recent applications suggest 
that the conservative scheme should be particularly ef¬ 
ficient when applied to large systems with short-range 
interactions. 

Early efficiency studies of the conservative scheme 
00 do not identify the mechanism which would en¬ 
sure the scalability of the PDES for an arbitrary sys¬ 
tem size. Recently, Korniss et al |23j introduced a new 
approach that exploits an analogy between the virtual 
time horizon and a fluctuating surface that grows in 
a deposition process. In this picture, the fraction of 
non-idling PEs (the utilization) exactly corresponds to 
the density of local minima in a virtual time surface. 
They showed that, in the case of one spin site per PE, 
the steady state virtual time surface is governed by the 
Edwards-Wilkinson Hamiltonian, implying that the uti¬ 
lization does not vanish for an infinitely large system of 
PEs. Ignoring communication delays and assuming that 
the utilization is equivalent to the efficiency, they con¬ 
cluded that the computation phase of short-ranged con¬ 
servative PDES is asymptotically scalable. In general, 
the utilization should not be taken as a sole measure of 
efficiency in the modeling of asynchrono us sy stems. The 


same analysis of a virtual time surface |23 


demonstrated that, in the case of one site per PE, the 
virtual time horizon infinitely roughens in the infinite 
PE-limit. The statistical spread of the virtual time sur¬ 
face severely limits an averaging or measurement phase 
of PDES, and divergence leads to severe difficulties with 
data management. Therefore, while the simulation phase 
(as determined by the utilization studies) is asymptoti¬ 
cally scalable, the measurement phase is not. To en¬ 
sure the efficiency of the algorithm, solutions need to be 
sought in which both phases of the computation are scal¬ 
able. 


In studies of asynchronous updates in large parallel 
systems, Greenberg et al |27j] proposed a K -random con¬ 
nection model, where at each time step each PE ran¬ 
domly connects with K other PEs in the system. The 
virtual time horizon for this model is short-range cor¬ 
related and has a finite width in the infinite PE-limit. 
Encouraged by this result, we considered the two alterna¬ 
tive modifications to the conservative scheme: a random 
connection model |0| and the moving window constraint 
p4[ . The purpose of these modifications is to ensure that 
the measurement phase of the conservative PDES is scal¬ 


able. 

This paper presents the results of systematic simula¬ 
tion studies of conservative asynchronous PDES with the 
moving window constraint (i.e., simulation studies of the 
simulations). In Section j[| we define terminology and we 
outline both the basic conservative update scheme and 
the constraint update rule that we use in modeling of 
asynchronous PDES. The scheme that we consider is such 
that the evolution of the time horizon is decoupled from 
the underlying systems. The only one assumption that 
we make about underlying complex systems is that they 
are characterized by short-range interactions. Therefore, 
our analysis is generally valid for a wide spectrum of 
physics applications. Section 111 contains the analysis 
of scalability issues, which is based on analogies between 
PDES and kinetic roughening in non-equilibrium surface 
growth. In the PDES analysis the focus is placed on two 
major issues: the scaling of the simulation phase and 
the scaling of the measurement phase. In Section IV we 
present and analyze numerical data that were obtained 
in large scale simulations of an asynchronous conserva¬ 
tive PDES scheme with a moving window constraint. In 
our time evolution studies, we simultaneously varied the 
width of the moving window and the system size (i.e., the 
number of processing elements in the system as well as 
the number of volume elements per processing element) 
in search for regularities that would allow general con¬ 
clusions. In Section |v|, we discuss connections between 
the scalability of a constrained conservative scheme and 
the design of highly efficient algorithms for asynchronous 
systems. 


II. CONSERVATIVE PDES FOR SPATIALLY 

DECOMPOSABLE CELLULAR AUTOMATA 

We consider an ideal system consisting of L identical 
PEs, arranged on a ring. Each PE has Ny lattice sites 
(or operation volumes) and the algorithm randomly picks 
one of the Ny sites. If the site that is picked is an end site 
communication with a neighboring PE is required, while 
no communication between PEs is required if an interior 
site is picked. For this system a discrete event means an 
update attempt that takes place instantaneously. The 
state of the system does not change between update at¬ 
tempts. Processing elements perform operations concur¬ 
rently, however, update attempts are not synchronized 
by a global clock. Such a system can represent, for ex¬ 
ample, concurrent operations of random spin flipping in 
a large spatially extended ensemble that can be arranged 
on a regular lattice. In this picture, the ensemble is spa¬ 
tially decomposed into L subsystems, each of which car¬ 
ries Ny sites. Each subsystem is carried by one PE and 
the required communication is the exchange of informa¬ 
tion about states of the border spins. In the simplest case 
there is one site per PE, Ny = 1, the system is a closed 
spin chain, and a spin-flip attempt at the fc-th PE de¬ 
pends on the two nearest-neighbor spins located on the 
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FIG. 1: Short range connections in PDES for a linear chain. 


(k — l)-th and the (k + l)-th PEs. The fc-th PE may 
not perform an update until it receives information from 
these neighboring PEs. 

In this conservative PDES scheme, to simulate asyn¬ 
chronous dynamics employing L processors, each £:-th 
PE generates its own local simulated time r k for the 
next update attempt. Update attempts are simulated 
as independent Poisson-random processes, in which the 
fc-th local time increment (i.e., the random time inter¬ 
val between two successive attempts ) is exponentially 
distributed with unit mean. A processor is allowed to 
update its local time if it is guaranteed not to violate 
causality. Otherwise, it remains idle. The time step t 
is the index of the simultaneously performed update at¬ 
tempt. It corresponds to an integer wall-clock time with 
each processor attempting an update at each value of t. 

The simplest choice for a communication rule between 
processors, which is faithful to the original dynamics of 
the underlying system, is a short-range connection model 
(Fig-®. where, at any time step (t + 1), the fc-th PE is 
allowed to update if its local simulated time r k (t) is not 
greater than the local simulated times of its two nearest 
neighbors: 


Tfc(t) < mm{T k -i(t),T k+1 (t)} . (1) 

The periodicity condition requires communication be¬ 
tween the first and the last PEs. In effect, this update 
rule introduces a nearest neighbor interaction and corre¬ 
sponding correlations between PEs, which is analogous 
with non-equilibrium surface growth. It was shown ]23| ] 
that in the case of Ny = 1 the evolution of the virtual 
time horizon on coarse-grain length scales is governed by 
the Kardar-Parisi-Zhang (KPZ) equation |29|| : 

d t r = d xx T - \{d x T) 2 +rj(x,t), (2) 

where x is a spatial variable in a continuum limit, the 
constant A is related to the coarse-graining procedure, 
and r](x,t) accounts for random fluctuations. 

In PDES with a moving window constraint, the com¬ 
munication pattern between processing elements remains 
the short-range connection type but the new update rule 
requires that additionally at each (t + l)-step the local 
simulated time of the fc-th processor fits within a win¬ 
dow of width A that is defined relative to the slowest PE 
(i.e., the one with the smallest r) at time t. Explicitly, 
the A;-th PE is allowed the update if r k ( t ) simultaneously 


satisfies the short-range connection condition (]l|) and the 
following window condition: 


r k (t) < A + min {r k (t) :k=l,...,L}. (3) 

In the computer science community the minimum in 
Eq.(|) is called the global virtual time © i§|- 
From this definition it follows that the short-range con¬ 
nection model can be viewed as a particular case of the 
original update scheme when the width of the window is 
set to infinity, in which case condition (|3j) is trivially sat¬ 
isfied for all times. Thus an infinite window is equivalent 
to the absence of the constraint. 

In typical simulations, when the number of volume 
elements Ny is larger than the minimum Ny = 1 , a 
causality condition (Q) is enforced only for the border 
volumes on each PE. If, at any t-step, a randomly cho¬ 
sen volume element happens to be from the interior, i.e., 
when all of its immediate neighbors reside on one PE, 
then the PE always executes the update and its local 
time is incremented for the consecutive update attempt: 
r k (t+l) = T k (t)+r/ k (t), where r/ k is the fc-th random time 
increment that is exponentially distributed, randomly 
chosen independently on each PE and at each parallel 
step t. In the constrained simulations, condition (|l) is 
enforced for any randomly chosen volume element. 

In the conservative update scheme a causality require¬ 
ment is the main mechanism that generates correlations 
among processing elements. In the absence of a causality 
requirement local simulated times would be incremented 
independently of each other in the fashion of random de¬ 
position (RD) ||. However, even with this RD update 
rule, imposing the windowing condition (|sj) alone will 
give rise to correlations among processors. Note that 
the RD update rule does not belong to a class of con¬ 
servative update schemes i.e., it cannot faithfully simu¬ 
late the underlying system dynamics. For A-constrained 
RD simulations, the speed with which the correlations 
spread among all PEs is determined by the width of the 
A-window. 

For a set of L processing elements, a simulated time 
horizon (STH) is defined as a set of L local simulated 
times at a time step t. To study the roughening of the 
STH surface, we monitor the surface width ( w{t )}, which 
is defined in standard fashion || via the variance of the 
STH: 

= ^^(Tfc(f)-f(f)) 2 ^, (4) 

where the angle bracket denotes an ensemble average and 
f denotes the mean virtual time, f(t) = x Sfc=i T k(t)- 
Alternatively, the surface width can also be defined as 
the absolute standard deviation ( w a (t )) from the mean 
virtual local time: 

(w a (t)) = (5) 
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We use both definitions (|||]) in our analysis. To study 
the efficiency of an update process in the system of L 
processors, we define the utilization (it/, (t)) as a frac¬ 
tion of processors that performed an update at paral¬ 
lel time-step t. Throughout the paper we consistently 
use the following notation: the surface width { w(t )) is 
an ensemble average of w(t) = y/w 2 (t) computed at t, 
while (w) denotes the corresponding steady-state value 
at t —> oo. The subscript “a” denotes the width com¬ 
puted in accordance to (||), while subscripts “L” or “Afy” 
(e.g., (wl,n v )) indicate the parameter dependance of the 
width computed in accordance to (|[). 


III. SCALABILITY MODELING 

There are two important aspects of scalability, which 
should be dealt with in studies of algorithm efficiency. 
Both of them regard the time in the system and connect 
with data management issues. The first is the question 
of whether or not the utilization reaches a constant non¬ 
zero value in the limit of large system size (when L and 
Ny may arbitrarily vary). In particular, one needs to 
know if the “worst case scenario” of one volume element 
per PE can produce a non-zero utilization in the infi¬ 
nite L-limit. A zero value of utilization in the infinite 
system size limit would suggest that an algorithm would 
likely be useless for computationally intensive tasks on 
future generations of massively parallel computers, i.e., 
on systems that contain hundreds of thousands of pro¬ 
cessing elements || . The second question is the behavior 
of the evolution of the STH, whether or not the statis¬ 
tical spread of the STH saturates in time or scales with 
the system size. A negative answer to the latter question 
would suggest that an algorithm would probably prove 
impractical in actual applications, because the divergence 
of the STH width implies that, for most computationally 
intensive jobs, the data collection and averaging would 
impose a memory requirement per PE in excess of hard¬ 
ware capacities. 

In our scalability studies we exploit existing analo¬ 
gies between the time evolution of the STH and kinetic 
roughening in non-equilibrium surface growth; and we 
use selected results of non-equilibrium surface studies 
H [nj in analyzing the stochastic behavior of the sys¬ 
tem under consideration. The conservative short-range 
communication scheme between the system components 
can be regarded as an effective short-range interaction 
among PEs and treated in a similar manner to an inter¬ 
action among sites of any non-equilibrium surface, grow¬ 
ing on a regular lattice. For these surfaces, the lateral 
correlation length £ between sites follows the power law 
£ ~ t 1 / 2 , where z is the dynamic exponent. For a fi¬ 
nite system, £ cannot grow beyond the system size L 
and it is observed that for times much smaller than a 
cross-over time t x , t x ~ L z , the surface width increases 
in accordance to t where (3 is the growth exponent. 
For times much larger than the cross-over time, the sur¬ 


face width saturates and scales as L a , where a is the 
roughness exponent. The exponents satisfy the scaling 
relation: z/3 = a. The values of the exponents are inde¬ 
pendent of the details of the system and of the nature of 
the interactions between sites. Their values can be de¬ 
termined from the corresponding stochastic growth equa¬ 
tion, which defines the universality class. We observed 
that the simulated time horizon shows kinetic roughening 
and the typical scaling behavior Q: 

K(t)) ~ t 20 for t <C tx, (6) 

(■ w 2 L (t )} ~ L 2a for t»t x . (7) 

It was demonstrated in J23| that in the case of one site per 
PE (i.e., Ny = 1) the time evolution of the STH in the 
short-range connection model ( 0 ) follows the KPZ equa¬ 
tion (||) and direct simulations confirmed that the scaling 
exponents in Eqs. (|(||7j) have values consistent with the 
KPZ universality class (a = 1/2 and (3 = 1/3). 

A. Steady-state scaling for utilization 

As the time index advances the utilization falls from 
its initial maximal value at t = 0. Figure ^ presents the 
time evolution of the utilization for various system sizes 
in the basic PDES with short-range connections with the 
infinite A-window. For each of the system size, the uti¬ 
lization attains a steady-state, characterized by a non¬ 
zero value in the infinite t- limit. This qualitative result is 
also true for the simulations in two and three dimensions, 
when an individual PE is allowed to connect with four 
and six immediate neighbors, respectively p5|. Such a 



FIG. 2: Unconstrained PDES: Time evolution of the mean 
utilization ( u{t)) (averaged over N = 1024 independent ran¬ 
dom trials) for various system sizes: L = 10 and 10 4 and 
N v = 1,10 and 100. 












non-zero steady-state value is characteristic for the KPZ 
universality class and can be expressed by the Krug and 
Meakin |3^] relation for generic KPZ-like processes: 

< , ,, , , const. 

hm < Ui (i)> « ( Uoo ) + (8) 

where (u ac ) denotes the utilization in the infinite L- limit. 
Toroczkai et al j3f| showed that the basic conservative 
PDES with one site per PE satisfies relation (||), and they 
used it to extrapolate their utilization data to large L. 
Their value for the utilization in the infinite PE-limit is 
(uoo) = 24.6461(7)% H ||. This finding demonstrates 
that the simulation part of the algorithm is scalable in 
the case of the 1-d conservative PDES with the mini¬ 
mal number of volume elements per PE. Explicitly, this 
means that even in the worst case scenario, it is possible 
to run simulations arbitrarily long with a non-zero aver¬ 
age rate of progress. In the case of 2-d and 3-d PDES, 
the roughness exponents are a = 0.2 — 0.4 (in 2-d) and 
a = 0.08 — 0.3 (in 3-d) j2f|, and the Ny = 1 steady- 
state utilization can be estimated as (woo) — 12% and 
(u 0 o) — 7.5%, respectively ||25|| . 

B. The evolution of the simulated time horizon 

The unconstrained PDES are characterized by an infi¬ 
nite roughening of the STH surface in the limit of infinite 
system size. Figure |ii] presents a typical time evolution 
of a surface generated by this basic update scheme for 
Ny = 1 and L = 100. As the time index advances, the 
surface grows and the statistical spread of its interface 
increases in accordance with Eqs. Figure || shows 

the time evolution of the surface width for a few selected 
system sizes. For a fixed system size the width follows re¬ 
lations after the initial growth phase, the surface 

saturates and its width reaches the plateau value. By 
comparing the widths for Ny = 1 (Fig. [|a) to those for 
Ny = 10 (Fig. |Jb), one can see that for a fixed L-number 
of PEs, increasing the number of sites per PE shifts the 
cross-over time t x to later values and increases the value 
of the width at the plateau. This is expected since a 
larger value of Ny yields a larger cumulative value for 
the local time increment between two successive commu¬ 
nications with neighboring PEs. In the case of Ny = 1, 
the width of the STH approaches a finite constant for a 
finite L-number of PEs; however this constant diverges 
in the infinite L-limit in the power law fashion: 


(w 2 ) ~ L 2a , (9) 

which gives the linear divergence of the surface variance 
for the KPZ universality class. The same holds for the ex¬ 
treme fluctuations above and below the mean simulated 
time . This finding is also valid in the case when 
each PE carries a block of sites. The above scaling be¬ 
havior creates difficulties when intermittent data on each 
PE have to be stored until all PEs reach the simulated 
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FIG. 3: Unconstrained PDES: Time evolution of the time 
horizon for L = 100 PEs and Ny = 1 sites per PE. The lower 
surface is a snapshot at t = 2, the upper surface is a snapshot 
at t = 100. For L = 100 the crossover time is f x ~ 3700. 


time instant at which statistics collection is performed 
(e.g., a simple averaging over the full physical applica¬ 
tion). The diverging spread of the time horizon implies 
a diverging storage need for this purpose on every PE. 
Thus, the diverging width of the STH means that the 
memory requirement per processing element, for tempo¬ 
rary data storage, diverges as the number of processing 
elements gets arbitrarily large. Therefore, the measure¬ 
ment phase of an algorithm that follows the basic con¬ 
servative update scheme is asymptotically not scalable. 
In actual applications, the programmer must seek some 
means of constraining the infinite roughening of the STH 
or must impose some global synchronization on the sys¬ 
tem of PEs. 

Our and Lubachevsky’s earlier studies show that to 
make the conservative scheme efficient, one must use a 
large number of volume elements Ny. It is expected that 
an increase in Ny will modify the growth phase of the 
STH. In the case of large Ny, the initial growth phase 
(for 0 < t < t\) should be characterized by /3 = 1/2, 
typical for the RD universality class. Then, after the 
first cross-over time t\ (for ti < t < t 2 , where t 2 is the 
saturation time) the growth should be characterized by 
P = 1/3, typical for the KPZ universality class. In this 
way, making the simulation phase more efficient (by in¬ 
creasing Ny) will speed up the initial growth. Thus, 
the state savings, which are traditionally associated with 
optimistic schemes, are disadvantageous in conservative 
schemes. 
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FIG. 4: Unconstrained PDES: Time evolution of the mean surface width {w(t)) of the STH (averaged over N = 1024 independent 
random trials) for various number L of PEs, in simulations with: (a) Ny = 1 site per PE; (b) Ny = 10 sites per PE. Since 
the plateau has been reached for L = 10 and L = 100, the times t larger than 10 4 are not shown. For L = 10 4 , the plateau is 
reached for t larger than 10®. 


IV. CONSERVATIVE PDES WITH THE 
MOVING WINDOW CONSTRAINT 

A standard way of controlling the growth of the STH is 
to impose a constraint on its width in the spirit of parallel 
simulations of asynchronous cellular automata that was 
proposed by Lubachevsky 10- A straightforward ap¬ 
plication of this idea is the A-constrained update scheme 
which demands that at each update attempt a PE can 
perform an update only if its value of Tfc is within the 
window. The effect of condition (j|j) is that fast PEs are 
forced to postpone their updates until slower PEs catch 
up. In the simplified version studied here, the assigned 
distance apart is measured in terms of the processor lo¬ 
cal time that is compared to the global minimum virtual 
time. Since at each t the global minimum of the STH 
changes its location, so does the window for the update. 
In this sense a moving window constraint can be consid¬ 
ered as an implicit rule that induces global synchroniza¬ 
tion, in which each PE connects with the slowest PE. 
From the implementation point of view, the most im¬ 
portant questions are the scalability issues for realistic 
systems, where each PE may carry an arbitrary number 
of sites, because mainly these issues will determine the 
efficiency of the algorithm in actual applications. 


A. Simulation phase 

In the A-constrained PDES, simulations reach a steady 
state for an arbitrary system size in a similar fashion as 
in the basic short-range connection model, illustrated in 
Fig. I In general, for any A-value, when L is fixed the 


steady state value of the utilization gets larger as N v 
gets larger; and, when Ny is fixed it gets smaller as L 
is increased. This behavior reflects the strength of the 
correlations between PEs which arise due to the update 
rule (§). Namely, for fixed L , the frequency of an update 
per PE increases as Ny increases because condition ( I) 
does not need to be verified for the internal sites and the 
probability of randomly choosing a border site is 1/Ny. 
Therefore, in this case, correlations that arise due to the 
short-range connections between PEs weaken when Ny 
is increasing. In the infinite Vy-limit these correlations 
become negligible and the process of incrementing local 
simulated times resembles random deposition on the 1-d 
lattice of size L. Thus, the RD-limit is equivalent to the 
infinite Vy-limit of PDES. 

The mean steady state utilization {ul,Nv) as a func¬ 
tion of the system size is presented in Fig. [|. When the 
number Ny of sites per PE is increased the curves con¬ 
verge towards the RD limit. With a narrow A-window 
(Fig. ||a) the RD limit is approached fairly quickly (with 
Ny = 100 for A = 10), while with a wide A-window 
(Fig. ^]b) the RD limit is approached more slowly. For an 
infinite A-window the RD limit is (ul,oo) = 100%, which 
is the effect of no-correlations in the system in this lim¬ 
iting case. Obviously, (ul) = 1 /L x 100% when A = 0, 
because in this case only one PE is allowed to make an 
update. The RD curves in Fig. || display the steady state 
utilization for simulations that are governed only by the 
update rule (|^) alone, i.e., in the absence of other commu¬ 
nications between processing elements. The fall off in the 
RD utilization values with an increase in the number of 
PEs, indicates the strength of correlations between PEs, 
which exclusively results from imposing the A-window 
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constraint. When all three parameters, A, L and Ny, 
are allowed to vary in conservative PDES, the value of 
the utilization is mostly determined by the width of the 
moving window. The choice of a very narrow A-window 
severely suppresses the average progress rate. 

To determine a scaling relation for the steady state uti¬ 
lization in the infinite PE-limit, we analyzed the mean 
steady state utilization (up) as a function of 1/L for sev¬ 
eral values of A and Ny, in each case performing a stan¬ 
dard rational function interpolation j34| of the simulation 
data: 


(ul) 


(10 

i + E£iM£)* 

1 , y^A'n-l gfc + 1 (]_\ k 

_ Oo _ 1 + 1 q t \L) 

1 + Efi, Mi)* L i + E£iMi)‘ 


where the polynomial degrees K n and Kd were varied to 
determine the best set of the interpolation coefficients. 
Then, knowing the leading coefficients do and a±, we ex¬ 
trapolated the utilization values to L = oo. In the infinite 
L-limit the leading term in Eq. © is (itoo) = a 0 and we 
obtain the following scaling relation: 

/ \ / \ const. . . 

lim {u L ) = Woo) 4- r —• l 11 ) 

L—> oo Lj 


The mean utilization Woo) i n the infinite L-limit, as a 
function of Ny and the A-window size, is presented in 
Fig. ||. Data points for Ny = 10 8 represent extrapolated 
values for the A-constrained RD simulations. It can be 
observed that in the infinite L-limit, as well as at each up¬ 
date attempt and in the saturation limit, the utilization is 
strongly affected by the value of A. A narrow A-window 
can significantly slow down the system because a signif¬ 
icant number of PEs (that otherwise would perform an 
update) may be constrained to wait for the slowest ones 
to catch up. This effect is particularly noticeable when 
the number Ny of sites per PE becomes large. For ex¬ 
ample, for Ny = 100, when the A-window is narrowed 
to A = 1, the utilization may drop by as much as 65% 
from its value at A = 100. When A = 0, (itoo) = 0 f° r 
any Ny > 1. 

The standard %-error in our simulation data for the 
utilization at each Lstep does not exceed 1% when con¬ 
figurational averages are extracted from N = 1024 in¬ 
dependent random trials, except for the data obtained 
with the infinite window, which are within a 2% error 
bar (due to a smaller N). We estimate that our values 
for the steady state utilization in the infinite L-limit are 
well within a 2% relative uncertainty. The utilization 
data that are presented in Fig. || can be fitted to the 
approximate formula: 

u(N v , A) = u RD { A) x u K pz(Ny) p( ' A ’ Nv \ (12) 


where the first factor approximates the utilization 
curve in the RD limit, urd( A) = limjv v — >oo u(Ny, A). 


The base in the second factor approximates the uti¬ 
lization curve in the infinite A-limit, uxpz(Ny) = 
limA^oo u(Ny, A). Justification and the details of the 
fit (^||) are outlined in the Appendix. Here, u(Ny,A) 
denotes an approximate value of (un v , a}- 

A mean-field type argument can also be used to derive 
an approximate formula for uxpz(Ny): 

-TiWA ~ 1= ( S ~ Nr) Pw ’ ( 13 ) 

UKPz(Ny) \ Ay/ 

where S depends on Ny and is the average number of 
steps a PE waits given that it has to inquire about the 
virtual time on its neighboring PEs, when the simulations 
reach the steady state in the system of the infinite num¬ 
ber of PEs. Equation (jlj]) is valid for Ny > 3, where the 
mean-field approximation of replacing the average of a 
function with the function of the averages has been used. 
In justification for Eq. © we assume that a neighbor¬ 
ing PE has a virtual time which lags behind that of the 
checking PE, hence requiring the checking PE to wait. 
Let the total number of times on average a PE advances 
be ntot = noK + n w , where hok is the number of times 
it does not have to wait and n w is the number of times it 
has to wait for its neighboring PE. Then, in a mean-field 
spirit: u K pz(N v ) = n to t/(n 0 K + Sn w ) = l/(poK + Sp w ), 
where pok = noK/n to t and p w = n w /n to t ■ Prob¬ 
ability pox is the probability of not having to wait 
when either an interior site or a border site is chosen: 
pok = (Ny - 2 )/N v + (1 - p w ) (2 /Ny), where p w is the 
probability of waiting when either of the border sites is 
chosen. Combining ukpz and pok gives Eq. (Ill- 
Similar arguments can be used to derive an approxi¬ 
mate formula in the limit of large A: 

meW - 1 =( { - w) p ” + v~ 1+ w*) 

(14) 

where n depends on both Ny and A, and is the average 
number of steps a PE waits given that it does not have 
to wait for a neighboring PE but has to wait because of 
the A-window constraint. The meaning of 5 and p w is as 
in Eq. (|l3|). Let n w be the number of times the PE waits 
given that a border site has been chosen, and tta be the 
number of times a PE waits because the A-condition has 
been not satisfied either at the border or at the interior 
site. The corresponding probability pa is the probabil¬ 
ity of waiting because the A-condition is not satisfied. 
In justification for Eq. ( p~i| ) we assume that in the limit 
of large A, the events of violating the window condition 
at the border are almost disjoint from the events of vio¬ 
lating the nearest-neighbor update condition. With this 
assumption, no matter which is done, one cycle will be 
used to update the configuration, so the total number of 
updates is ntot = uok +n w +nA, while the number of cy¬ 
cles taken on average is uok + Sn w + kua- Defining the 
probabilities as above, yields the approximate relation 
©• Additional approximations can be made by assum¬ 
ing uniformly distributed waiting times. Note that for 










FIG. 5: Mean steady state utilization (u) in constrained PDES as a function of the system size for the A-window size: (a) 
A = 10; (b) A = 100. L is the number of PEs. When the number Ny of sites per PE is increased the curves converge towards 
the RD limit. 



FIG. 6: Mean utilization ( Uoo) in the limit of L —> oo as 
a function of the number of volume elements Ny and the 
A-window size. Data points for Ny = 10 s present the con¬ 
strained RD simulations. Symbols represent the simulation 
data. The lines are guides for the eyes. 


fixed Ny and A, both S and k can be measured indepen¬ 
dently of the utilization, thereby testing the mean-field 
spirit of the calculation. 


B. Measurement phase 

Direct simulations show that the A-constrained width 
of the STH is bounded: its absolute spread remains 
within the A-value for an arbitrary system size. This 
result should be expected since the update rule (j|) im¬ 
plies that independently of the system size, at each up¬ 
date attempt, the absolute deviation from the minimum 
cannot take on values much larger than A (if it does, 
the update does not happen). Thus, w a as well as w 
may not exceed A. The surface of the STH is effectively 
smoothed at each update attempt. Figure £i] shows the 
difference in roughening for two surfaces after t = 1000 
steps: the upper surface is obtained in simulations with¬ 
out the A-constraint, while the lower surface is obtained 
with A = 5. 

The A-constrained time surfaces exhibit the initial 
growth and the saturation at later times, similar to Fig. |J. 
However, a detailed analysis of the time evolution of the 
surface width suggests that, in general, they do not be¬ 
long to the KPZ universality class unlike surfaces gener¬ 
ated with A = oo. Figure || presents a typical behavior 
of the width for A = 10. In general, the transition to 
the saturated state takes place over a time interval (a 
“bump” in Fig. ||), whose length and position depends 
mainly on the A-value and cannot be characterized by 
a single crossover time. In the initial growth phase for 
t <C t p ( t p marks the beginning of the plateau, Fig. ||), the 
surface width scales as tP , i.e., for a fixed A and Ny, the 
growth is characterized by one value of exponent p for 
any L. In general, surfaces generated with various values 
of parameters A and Ny are characterized by various ef¬ 
fective values of the growth exponent /3. When A = oo, 
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FIG. 7: The roughening of the STH. For A = oo (the upper 
surface), the crossover time is t x ~ 4000, and for A = 5 (the 
lower surface) the width begins to saturate at t p fa 40. 
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(3 values are between the KPZ value of 1/3 (for Ny = 1) 
and the RD value of 1/2 (for Ny = oo) for small and in¬ 
termediate Ny and L. In the saturated phase, t p <C t, for 
a fixed value of A, the surface width ( wl,n v (/)) decreases 
with the system size, as can be observed from Fig. ||. The 
saturated surface width as a function of the system size 
is plotted for A = 100,10, 5,1 in Fig. ||. It can be seen 
that increasing the number of PEs and the number of 
sites per PE does not result in infinite roughening of the 
STH. 

The STH produced in the RD simulations with the in¬ 
finite A-window (in other words, in PDES with no com¬ 
munication between PEs) is characterized by a surface 
that is not self-affine |J. Nonetheless, the presence of a 
finite A-window constraint in the RD simulations forces 
the STH surface to saturate (Fig. ||). Therefore, this 
type of PDES no longer belongs to the RD universality 
class, characterized by (3 = 0.5 and a = oo. In the A- 
constrained PDES, the A-constrained RD surface is the 
limiting case when the number of sites per PE grows to 
infinity. 

An interesting feature in the surface width evolution 
graphs (Fig. || and Fig. © is the presence of a maxi¬ 
mum that marks the end of the growth phase. Its double 
peak structure can be explained, both quantitatively and 
qualitatively, in terms of simplex geometry |35| . In the 
set of L processing elements we distinguish between slow 
PEs (group (S)) and fast PEs (group (F)). At the t- th 
update attempt, the fc-th processor belongs to group (S) 
if its local time Tk(t) is less then or equal to the mean 
local time over all processors at the f-step. Otherwise, 
it belongs to group (F). One can define the variance w 2 


I I I I I I I I | I I I I I I I I | I I I I I I I I 
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FIG. 8: The time evolution of the mean STH surface width 
( w(t )) (averaged over N = 1024 independent random trials) 
in PDES with A = 10, for: (a) L = 100; (b) L = 1000. 
The curves are plotted for several Ny. Plots of ( w a (t )) look 
exactly the same. 


and the width w a for each group as follows: 




W) £ (WO-fM) 2 , 


(15) 


W a (X){t) 


1 L (.x)(t) 

W) £ A-nW-n<)|, 


(16) 


where “X” stands for either “S” or “F”, and L = 
L(s){t) + T(f)(/)- The variance w 2 and width w a of the 
STH can be expressed as the convex linear combinations: 


w 2 {t) = f(S)(t)wf S )(t) + f( F )(t)wf F) {t), (17) 
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FIG. 9: PDES with the A-window constraint: The steady state surface width (w) of the STH as a function of the system size, 
(a) A = 100; (b) A = 10; (c) A = 5; and, (d) A = 1. The curves are plotted for several values of volume elements Ny- The 
lines are guides for the eyes. 


Wait) = f{s){t)w a{S ){t) + f( F ){t)w a(F) (t), (18) 

where 1 = f {S )(t) + /( f ) (*), 0 < f(s),f(F) < 1- Explic¬ 
itly, w 2 and w a form a 1-d simplex with vertices at S 
and F. The coefficients and f( F ) are the fractions of 
slow and fast processors, respectively, in the system at 
each update attempt t. Figure |T(] shows the time evolu¬ 
tion of the surface widths ui a (g), tc a (F) and w a (Fig. |lC|a) 
and the corresponding fractional contributions /(g) and 
/( f ) (Fig.® for the first 500 simulation steps. Quan¬ 
titatively, the double peak in w a (t) (at about t = 10) 
presents the weighted sum of w a (s) and iu a (F) in accor¬ 
dance with Eq. which is evident by matching the 

width contributions (Fig. |To|a) with the corresponding 


fractional contributions (Fig. 0b) at each f-step. Qual¬ 
itatively, the decrease in surface widths for t > 10 is the 
effect of the constraint condition (|dj). In the particular 
case of simulations with A = 10 and Ny = 1000, illus¬ 
trated in Fig. 10, initially the majority of PEs belongs 
to the slow group (about 63% at t = 1, Fig. |Io]b), but as 
t advances the STH roughens and the population of the 
slow group falls while the population in the fast group 
grows. As the population of the fast group gets larger, 
the fraction u of PEs that are allowed to update falls be¬ 
cause some of the fast PEs violate condition (J^). While 
the fast PEs are waiting (i.e., no local time increments 
at the fast sites), the slow PEs are incrementing their 
local times, hence, the mean simulated time increases 
and therefore the deviation from the mean in Eq. (|h]) 
(and Eq. ©)) decreases for the fast PEs. This is the 
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main mechanism in the formation of the maximum in the 
w a (F) curve. Similarly, the first maximum in the u> a (s) 
curve is formed mainly due to the depopulation of the 
slow group. As the slow PEs are “catching up” with the 
fast PEs within the A-window for the update, the uti¬ 
lization u increases (20 <t< 50, Fig. 0>) and so do the 
widths. This secondary maximum is less pronounced be¬ 
cause the populations of the two groups are close apart. 
Eventually, after several cycles, the widths as well as the 
utilization reach steady values. 

In other words, the way in which the system undergoes 
the transition from the initial state to the steady state, 
observed in the above example, is a direct consequence of 
the window constrained update scheme i) and the par¬ 
ticular initial condition, in which all PEs enter simulation 
with their local times equal. If this initial condition of the 
full synchronization is changed, for example, by assum¬ 
ing that at t = 0 the local times are randomly distributed 
about some mean local time, the transition to the steady 
state will change its character. On the other hand, if at 
some later update attempt t s the system is synchronized 
(which is equivalent to setting all local simulated times 
to one value at t s ) then the recurrent time evolution will 
be repeated after t s until the steady state is attained. 

The above arguments can be restated in terms of the 
STH variance w 2 (taking Eq. ( 0 ) as the key to the analy¬ 
sis) for any system size. In the conservative PDES with a 
moving window constraint the system evolution towards 
a steady state follows essentially one path, along the 
above outline, for any value of the simulation parame¬ 
ters A, Ny and L. In our example we chose tentatively a 
very narrow A-window and a large Ny so the effect of the 
update scheme (||) is clearly pronounced in the evolution 
curves. In such a system,the correlations that arise due to 
the short-range connection update scheme (JlJ) are small 
relative to the correlations that arise due to the window 
constraint (jd|). Accordingly, in this case one can clearly 
deduce that a sharp fall in the utilization curve is the 
effect of a sharp population rise in the group of fast PEs. 
For example, one can read from Fig. [bjb that at t = 10 
about 25% of the PEs that did not make an update were 
mostly in the group of fast PEs, so approximately about 
one half of the fast PEs updated at this update attempt. 
A similar conclusion is certainly false when each PE in 
the system carries a small number of sites (e.g., Ny = 10) 
since in this case the correlations that originate due to 
the short-range connections between PEs may not be ne¬ 
glected and the utilization curve begins the fall at t = 0 
because the fast PEs and the slow PEs fail to satisfy con¬ 
dition © with approximately equal frequency. Opening 
the A-window wide (e.g., A = 100) effects the evolution 
curves in two ways. First, it slows down the build-up 
of the correlations that arise due to constraint !)■ This 
makes the growth phase longer so that a transition to the 
steady state takes place at later times and over extended 
time intervals. Secondly, it softens the correlations that 
arise due to constraint (^]), which smooths a transition to 
the steady state and the ripple-like features in the utiliza- 
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FIG. 10: PDES with A = 10, N v = 10 3 and L = 10 4 : (a) 
Time evolution of the surface widths; (b) Time evolution of 
%-fractional contributions to the surface widths. Subscripts 
(S) and (F) denote a fraction of processors in the slow and 
in the fast group, respectively; and u is the utilization. Con¬ 
figurational averages were taken over N = 1024 independent 
random trials. The lines are guides for the eyes. 


tion and the width curves (that are clear in Fig. 10) are 
only weakly present or vanish into statistical uncertain¬ 
ties. For example, in the worst case scenario of Ny = 1 
and A = 100, the time evolution towards the steady state 
follows the pattern typical for the KPZ universality class 
(A = oo), which suggests that the main correlation mech¬ 
anism results from the update scheme (|I]) in this case. 
Nonetheless, unlike the KPZ surfaces, the presence of 
the A-window prohibits the steady-state surface width 
to grow infinitely as the system gets larger. 
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V. DISCUSSION 

Our statistical analysis of the growing virtual time in¬ 
terface in conservative asynchronous PDES with a mov¬ 
ing window constraint, shows that in the steady state 
the average utilization remains finite (and non zero) and 
scales with the system size. Similarly, the statistical 
spread of the STH remains finite and scales with the sys¬ 
tem size. This was demonstrated for a range of volume 
elements from Ny = 1 to the RD-limit at Ny = oo. The 
convergence of the utilization to a finite value and the 
convergence of the time interface width to a finite value 
as the number of processing elements infinitely increases, 
reflects positively on the ability to efficiently implement 
this type of PDES in applications. In other words, with 
this global A-window constraint the simulation part as 
well as the measurement part of the algorithm are simul¬ 
taneously scalable. 

The practical questions that should be addressed in¬ 
volve suitable implementations of the algorithm, possi¬ 
ble modifications and generalizations that would facil¬ 
itate applications by optimizing performance and thus 
maximizing efficiency. Such questions would likely be 
non-universal, and hence depend on the explicit problem 
being simulated. 

It follows from our analysis that the utilization as de¬ 
fined by a fraction of working processors, is not a sole 
measure of the efficiency. However, it is an important 
component of the efficiency. The case of PDES with 
the basic conservative scheme, when the measurement 
phase is not scalable, suggests other important factors 
that should be considered in efficiency studies. The sec¬ 
ond important component is the statistical spread of the 
virtual time surface as it is measured by its variance or 
by its mean absolute deviation. The third important 
element is the frequency and the effect of extreme fluctu¬ 
ations in the virtual time interface. The forth important 
factor is the average progress rate, which could be mea¬ 
sured by the growth rate of the global minimum of the 
virtual time interface. An efficient algorithm should be 
characterized by the highest values of the utilization and 
the progress rate, while having small statistical spread 
in waiting times and should lack severe effects of the ex¬ 
treme time fluctuations. 

Applying the above recipe to conservative asyn¬ 
chronous PDES with a A-window constraint, the re¬ 
sults of our studies indicate that this kind of simulation 
presents a promise of becoming a good departure point 
towards the design of an efficient class of algorithms for 
asynchronous systems. The A-window constraint not 
only eliminates the extreme fluctuations in the virtual 
time horizon but also controls the statistical spread of 
the STH and controls the average progress rate. The 
width of the A-window can serve as a tuning parameter 
that, for a given volume load per processor, could be ad¬ 
justed to optimize the utilization so as to maximize the 
efficiency. 

In the conservative asynchronous PDES studied in this 


work, there is no condition imposed that would explic¬ 
itly synchronize a system in the course of the simula¬ 
tions. The system is fully synchronized only initially and 
undergoes desynchronization over time (i.e., over many 
parallel steps). The degree of this desynchronization is 
strictly related to the roughening of the STH. As the sim¬ 
ulations evolve, correlations between system components 
build up, which is reflected by changes in the morphol¬ 
ogy of the STH. There are two sources of correlations in 
the STH. The first is the nearest-neighbor communica¬ 
tion rule that, if acted alone, would eventually lead to 
the steady state, where the entire system is correlated. 
In the case of one volume element per PE, the time to 
the global correlation is on the order of L 3 / 2 . However, 
the presence of this global correlation does not cause an 
implicit synchronization nor does it lead to a state of 
near-synchronization. On the contrary, despite this cor¬ 
relation there are no global bounds on the roughening 
of the virtual time horizon: the larger the system, the 
more desynchronized it becomes over time. The nearest- 
neighbor communication rule is the essence of the con¬ 
servative scheme because it ensures that causality is not 
violated in any update. The second source of correlations 
is the constraint in the form of the moving window condi¬ 
tion. The moving window condition, if acted alone, would 
lead to the steady state, where the entire system is not 
only correlated but, also, is synchronized to some extent. 
The extent to which the system may become synchro¬ 
nized depends on the width of the moving window — the 
roughening of the virtual time horizon is constrained to 
the A-window width. Notice, the moving window condi¬ 
tion is not necessary for the conservative scheme. Its role 
is to ensure that infinitely large desynchronization will 
not happen. In this sense the constraint condition can 
be seen as an implicit synchronization protocol. In the 
constrained conservative PDES, the above two correla¬ 
tion mechanisms act together: the nearest neighbor con¬ 
nection rule explicitly guarantees causality and the con¬ 
straint rule implicitly guarantees a near-synchronization 
in an arbitrarily long sequence of update attempts. 


VI. SUMMARY AND OUTLOOK 

We considered the conservative parallel discrete event 
simulations with the moving window constraint and stud¬ 
ied the time evolution of the utilization as well as the time 
evolution of the stochastic time horizon by varying the 
system size (i.e., the number L of processing elements 
and the number Ny of sites per processing element) and 
by varying the width of the moving window. The results 
of our simulations indicate that this particular class of 
algorithms with the conservative update scheme gener¬ 
ally scales with the system size. The utilization reaches 
a steady state value after a finite number of simultane¬ 
ously performed parallel steps and approaches a finite 
non-zero value in the limit of infinite system size. This 
demonstrates that the simulation part of the algorithm 
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is scalable. The statistical spread of the stochastic time 
horizon is bounded by the size of the moving window 
constraint in the limit of the infinite system size, which 
shows that the measurement part of the algorithm is scal¬ 
able. In particular, in the limit of a large number of sites 
per processing element the results of the simulations ap¬ 
proach the constrained random deposition model, which 
is characterized by a high value of utilization while per¬ 
mitting effective data management. The simultaneous 
scalability of both phases of the algorithm is an impor¬ 
tant finding because it establishes a solid ground for the 
design of new class of efficient algorithms for parallel pro¬ 
cessing to model the evolution of spatially extended in¬ 
teracting systems with asynchronous dynamics. Further 
studies are required in the search for possible optimal 
implementations. For example, explicitly taking into ac¬ 
count the time required to find the global minimum of 
the STH at each step. 

Aside from practical aspects of the constrained parallel 
conservative discrete event simulations that are oriented 
to direct applications such as the scalability issues, there 
are several interesting physics questions that arise in con¬ 
nection with the stochastic time surface growth. These 
include the development of the lateral correlations and 
transient relaxation processes. We leave these questions 
open to possible future investigations. 
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APPENDIX: THE UTILIZATION DATA 


In the infinite L-limit the utilization is a two param¬ 
eter family of curves (Fig. |oj). The two limiting curves, 
urd{ A) and ukpz{ Ny), approach u = 1 in the infi¬ 
nite limit of their arguments. One can consider either 
urd or ukpz as an independent variable x and ex¬ 
press the utilization as y = y(x). We chose parame¬ 
terization by Ny, where x = x(Ny) = ukpz(Nv) and 
Ua(x) = 2/A ( x{Ny )). Figure 11 illustrates the idea by 
plotting the utilization y&.(x) for several values of A. The 
curves in Fig. [id] are a family of roots that, in first ap¬ 
proximation, could be expressed by y&{x) = a(A)x p ^\ 
where a( A) and p( A) have fractional values. To find a( A) 
and p( A) each curve is fitted to “the best” two-point for¬ 
mula. Then, sequences a(A) and p( A) are expressed by 
fit functions. 



FIG. 11: Family of utilization curves va{x) vs x = 
ukpz(Nv), illustrating the underlying idea of the fit. For 
Ai < A 2 < ■ • • < A = oo, 2/Aj < Va 2 < ■ ■ ■ < yoo = x. For 
A = 0, Va(x) = 0 (not shown). Symbols mark the simulation 
data. The cubic spline curves are guides for the eyes. 


A fit to a(A) is chosen in such a way that a(0) = 0 and 
a(oo) = 1. In Fig. [id], a(0) = 0 corresponds to y{x) = 0 
because A = 0 yields u = 0 for L = oo. Condition 
a(oo) = 1 corresponds to y[x) = x because A = oo 
means y = ukpz- Considering the limit behavior of 
Va(x), when x(Ny = oo) = 1 the coefficient a(A) must 
be interpreted as urd( A). This is also consistent with 
the alternative parameterization, where x = urd(A). 
Therefore, we directly identify a(A) with the approxi¬ 
mate expression for urd{ A). A four-point fit can be 
found as: 

u rd (A) « a( A) S-cT 1 - cT- ( AA ) 

A A e 3 A e4 


When C 3 = 15.8, e 3 = 1.07, c\ = 12.3 and e 4 = 1.18, 
fit (A.l) is good within ±2% relative error in the range 
0 < A < 00 . A simple two-point fit with C 3 = 3.47, 
e 3 = 0.84 and C 4 = = 0 approximates our simulated 

data within ±2.5% relative difference (Fig. |], utilization 
values for Ny = 10 s ). 

Considering the limits ukpz{Nv = 1) ~ 1/4 and 
ukpz{Nv = 00 ) = 1, a four-point fit to x(Ny) is: 


u K pz(N v ) = x(N v ) 


Cl 

Ny 


C2 

Ny 


(A.2) 


W hen c i = 2.3, ei = 0.96, c 2 = 0.74 and e 2 = 0.4, 
fit (Ad) is good to within ±2% relative error in the range 
1 < Ny < 00 . A simple two-point fit with ci = 3.0, 
ei = 0.715 and c 2 = e 2 = 0 approximates our simulated 
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data within ±2.5% relative difference (Fig. |j. utilization 
values for A = oo). 

In fitting the power p( A), the limits are p (A = oo) = 1 
and p(0) = 0. In Fig. [id], condition p (A = oo) = 1 
means y(x) = x. Condition p(0) = 0 expresses the fact 
that for small A the utilization depends mainly on A 
(not Ny) and, therefore, the exponent p( A) should be 
almost zero for small A. A simple two-point formula 
gives p(A) = 1/(1±2/A 3 / 4 ). With this p(A), a simple fit 
to u(Ny , A) ss a(A)x(Ny) p( - A ' 1 has a ±10% relative error 
when a(A) and x(Ny) are given by simple two-point fits. 
The actual power p depends weakly also on Ny, p = 
p(A,Ny). A four-point formula that accommodates the 
Ny dependance can be expressed as: 


The fit is good to within ±5% relative uncertainty 
for all A- and Ay-values when urd and ukpz are ex¬ 
pressed by four-point fits (A.1) and (A.2), respectively, 
and when p is given by (A.3) with the following fit param¬ 
eters: for Ny > 100: C 5 = 528.4, e$ = 1.487, cq = 515.1, 
ee = 1.609; for Ny < 10: C 5 = 17.43, e 5 = 1.406, 
cq = 15.3, e& = 1.687; otherwise: C 5 = 5.345, e$ = 0.627, 
cq = 0.095, ee = 0.045. 


P(A,Nv)Ki , C 5 (Ny) C 6 (lVy) ‘ (A ' 3) 

1 + \e 5 (N v ) ^egiNv) 
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